home *** CD-ROM | disk | FTP | other *** search
- From: john@acorn.co.uk (John Bowler)
- Subject: Re: Multiprocessing Archimedes??
- Date: 16 Aug 91 11:10:50 GMT
-
- torq@GNU.AI.MIT.EDU (Andrew Mell) writes:
- >I notice that the Arm3 has a new instruction over the Arm2 which is
- >SWP. It swaps a byte or a word between register and external memory.
- >(uninterruptible between the read and write)
- ^^^^^^^^^^^^^^^
-
- Indeed, but not necessarily not interleavable with other memory operations
- (sorry about the double negative :-). In particular, to fully support the
- SWP on a system with multiple memory bus masters the memory control logic
- which decides which bus master has access to the memory next would have to
- force an interlock between the memory read and memory write of the SWP
- instruction. Now, the ARM3 has a LOCK pin for this, but to support
- multi-processors you need to connect it to something :-).
-
- >All very interesting you might say, but it intrigues me as this sort
- >of instruction is usually only used in multiprocessor systems as a
- >software semaphore.
- >
- >Why did Acorn add this instruction to the Arm3?
-
- Because a long time ago, when we were very young (;-) we tried to write a
- multi-threaded OS (ARX) and we ``found'' (sic, thought) that it was
- spending a lot of time going into supervisor mode and disabling interrupts
- so that it could implement mutexes (for user mode code - including the OS,
- which ran in user mode too). In theory SWP allows user code to implement
- mutexes efficiently.
-
- As far as I am concerned the MP aspects of SWP are bonuses (clearly these
- were considered at the same time - or the LOCK pin wouldn't be there).
- Notice that SWP always bypasses the cache; again this is MP support, however
- there is an ommission here in that it is impossible to do a (reliable) read
- from external memory (you might get the cache contents instead!)
-
- John Bowler (jbowler@acorn.co.uk)
-
-
- From: john@acorn.co.uk (John Bowler)
- Subject: Re: Multiprocessing Archimedes??
- Date: 19 Aug 91 16:25:33 GMT
-
- julian@bridge.welly.gen.nz writes:
- >john@acorn.co.uk (John Bowler) writes:
- >
- >> Notice that SWP always bypasses the cache; again this is MP support, however
- >> there is an ommission here in that it is impossible to do a (reliable) read
- >> from external memory (you might get the cache contents instead!)
- >
- >If you're using it to implement semaphores, this is not a problem, as you'd
- >never need to access the semaphore with any instruction other than SWP.
-
- Yes; there is no problem with the semaphore, but the semaphore must be
- protecting some state which is shared. When a processor has claimed that
- semaphor it probably needs to read the state and to obtain consistent
- results when it reads it. If the data is in cacheable memory the only way
- it can do that is to use sequences of the form:-
-
- SWP rx, rx, [raddr] ; read a value out
- STR rx, [raddr] ; and put it back... :-(
-
- The alternative is to allocate shared data in uncacheable memory. This
- requires some OS intervention (a user program cannot simply allocate
- shareable data structures out of its own heap unless the whole heap is
- uncacheable) and uncacheable data obviously has a performance hit.
-
- >BTW. You wouldn't happen to know the instruction format for SWP, by any
- > chance? If a software emulator can be written for it for ARM2 machines
- > (like the FPE - or even add it to the FPE) then we can all start using
- > it.
-
- RISC iX 1.2 emulates the SWP instruction on machines which do not support
- it. RISC OS doesn't. The assembler syntax is:-
-
- SWP{cond}{B} Rd, Rm, [Rn]
-
- the semantics (except for the cache behaviour and so on) are:-
-
- MOV <temp>, Rm
- LDR{cond}{B} Rd, [Rn]
- STR{cond}{B} <temp>, [Rn]
-
- (ie the SWP Rx, Rx, [Raddr] example above *does* store the *old* Rx value
- in [Raddr]... :-).
-
- The instruction format is:-
-
- bit 31 bit 0
- c.o.n.d.0.0.0.1 0.B.0.0.n.n.n.n d.d.d.d.0.0.0.0 1.0.0.1.m.m.m.m
-
- c.o.n.d - the condition
- B - 0 = swap word
- 1 = swap byte
- n.n.n.n - Rn
- d.d.d.d - Rd
- m.m.m.m - Rm
-
- Data aborts (from the memory manager) leave Rd/Rm as they were before.
- SWP bypasses the ARM3 cache, although the write operation still updates
- the cache (if the address is cached). I don't know whether the read
- will cause the rest of that part of the cache to be updated (I assume
- not, and the programmer should not care :-)
-
- John Bowler (jbowler@acorn.co.uk)
-
-
-
- From: dseal@armltd.co.uk (David Seal)
- Subject: Re: ARM3 instructions.
- Date: 4 Sep 92 15:01:12 GMT
-
- In article <4422@gos.ukc.ac.uk> amsh1@ukc.ac.uk (Brian May#2) writes:
-
- > I don't have an Archie myself but have used them quite a lot in the past.
- >I was recently mucking about with a friend's A5000, trying to find the new
- >instructions that turned the cache on and off. I found them, they were
- >co-processor instructions with the processor itself as (I think) number 0.
-
- Coprocessor 15, in fact.
-
- > Anyway, as I was disassembling away I found a new instruction (well, I had
- >never come across it before). It was 'SWP' and I imagine it swaps registers
- >with registers, maybe with memory as well? I can't remember. If it does
- >reg<->mem as well, and is uninterruptable, perhaps it is for use as a
- >semaphore in multi-processor systems?
-
- The SWP instruction was new to the ARM2as macrocell. I believe ARM3 was the
- first full chip which contained it. More recent macrocells and chips like
- ARM6, ARM60, ARM600 and ARM610 also contain it.
-
- It only swaps a register with a memory location (either a byte or a word),
- and not two registers. It can however read the new contents of the memory
- location from one register, and write the old contents of the memory
- location to another register - i.e. it doesn't have to do a pure swap. This
- may be the source of your idea that it can swap two registers. It is indeed
- uninterruptable, and yes, it is intended for semaphores.
-
- > Of course I won't be the first person to notice this so I wondered, could
- >someone post some info on this, and also on the co-processor instructions
- >relevant to the CPU itself?
-
- The SWP instruction:
- Bits 31..28: Usual condition field
- Bits 27..23: 00010
- Bit 22: 0 for a word swap, 1 for a byte swap
- Bits 21..20: 00
- Bits 19..16: Base register (addresses the memory location involved)
- Bits 15..12: Destination register (where the old memory contents go)
- Bits 11..4: 00001001
- Bits 3..0: Source register (where the new memory contents come from)
-
- Byte swaps use the bottom byte of the source and destination registers,
- and clear the top three bytes of the destination register. There are
- various rules about how R15 works in each register position, similar to
- those for LDR and STR instructions. The destination and source registers
- are allowed to be the same for a pure swap. I don't know offhand what
- would happen if the base register were equal to one or both of the others,
- but I don't think I'd recommend doing it!
-
- Assembler syntax is (using <> around optional sections):
- SWP<cond><B> Rdest,Rsrc,[Rbase]
-
- The ARM3 cache control registers are all coprocessor 15 registers, accessed
- by MRC and MCR instructions in non-user modes. (They will produce invalid
- operation traps in user mode.)
-
- Coprocessor 15 register 0 is read only and identifies the chip - e.g.:
- Bits 31..24: &41 - designer code for ARM Ltd.
- Bits 23..16: &56 - manufacturer code for VLSI Technology Inc.
- Bits 15..8: &03 - identifies chip as an ARM3.
- Bits 7..0: &00 - revision of chip.
-
- Coprocessor 15 register 1 is simply a write-sensitive location - writing any
- value to it flushes the cache.
-
- Coprocessor 15 register 2: a miscellaneous control register.
- Bit 0 turns the cache on (if 1) or off (if 0).
- Bit 1 determines whether user mode and non-user modes use the same address
- mapping. Bit 1 is 1 if they do, 0 if they have separate address
- mappings. It should be 1 for use with MEMC.
- Bit 2 is 0 for normal operation, 1 for a special "monitor mode" in which
- the processor is always run at memory speed and all addresses and data
- are put on the external pins, even if the memory request was satisfied
- by the cache. This allows external hardware like a logic analyser to
- trace the program properly.
- Other bits are reserved for future expansion. Code which is trying to set
- the whole control register (e.g. at system initialisation time) should
- write these bits as zeros to ensure compatibility with any such future
- expansions. Code which is just trying to change one or two bits (e.g.
- turn the cache on or off) should read this register, modify the bits
- concerned and write it back: this ensures that it won't have unexpected
- side effects in the future like turning as-yet-undefined features off.
- This register is reset to all zeros when the ARM3 is reset.
-
- Coprocessor 15 register 3: controls whether areas of memory are cacheable,
- in 2 megabyte chunks. All accesses to an uncacheable area of memory go
- to the real memory and not to the cache - this is a suitable setting
- e.g. for areas containing memory-mapped IO, or for doubly mapped areas
- of memory.
- Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are cacheable, 0 if they
- are not.
- Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are cacheable, 0 if they
- are not.
- :
- :
- Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are cacheable, 0 if
- they are not.
-
- Coprocessor 15 register 4: controls whether areas of memory are updateable,
- in 2 megabyte chunks. All write accesses to a non-updateable area of
- memory go to the real memory only, not to the cache - this is a suitable
- setting for areas of memory that contain ROMs, for instance, since you
- don't want the cached values to be altered by an attempt to write to the
- ROM. (Or, as in MEMC, by an attempt to write to write-only locations
- that share an address with the read-only ROMs.)
- Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are updateable, 0 if
- they are not.
- Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are updateable, 0 if
- they are not.
- :
- :
- Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are updateable, 0 if
- they are not.
-
- Coprocessor 15 register 5: controls whether areas of memory are disruptive,
- in 2 megabyte chunks. Any write access to a disruptive area of memory
- will cause the cache to be flushed. This is a suitable setting for areas
- of memory which if written, could cause cache contents to become invalid
- in some way. E.g. on MEMC, writing to the physically addressed memory at
- addresses &2000000-&2FFFFFF will also usually change a virtually
- addressed location's contents: if this location is in cache, a
- subsequent attempt to read it would read the old value. To avoid this
- problem, the physically addressed memory should be marked as disruptive
- in a MEMC system. Similarly, any remapping of memory on a MEMC or other
- memory controller should act disruptively, since the cache contents are
- liable to have become invalid.
- Bit 0 is 1 if virtual addresses &0000000-&01FFFFF are disruptive, 0 if
- they are not.
- Bit 1 is 1 if virtual addresses &0200000-&03FFFFF are disruptive, 0 if
- they are not.
- :
- :
- Bit 31 is 1 if virtual addresses &3E00000-&3FFFFFF are disruptive, 0 if
- they are not.
-
- Coprocessor 15 registers 3-5 are in an undefined state after power-up: they
- must be programmed correctly before the cache is turned on.
-
- Note that you should check the identity code in coprocessor 15 register 0
- identifies the chip as an ARM3 before assuming that the other registers can
- be used as stated above, unless you are absolutely certain your code can
- only ever be run on an ARM3. Otherwise you are likely to run into problems
- with other chips - e.g. an ARM600 uses the same coprocessor 15 registers to
- control its cache and MMU, but in a completely different way. Just about the
- only thing they do have in common is that coprocessor 15 register 0 contains
- an identification code as described above.
-
- David Seal
- dseal@armltd.co.uk
-
- All opinions are mine only...
-
-
-
- From: mhardy@acorn.co.uk (Michael Hardy)
- Subject: Re: Risc-OS Documentation
- Date: 15 Aug 91 09:45:14 GMT
- Organization: Acorn Computers Ltd, Cambridge, England
-
-
- ARM3 SUPPORT
- ============
-
-
- Introduction and Overview
- =========================
-
- The ARM3Support module provides commands to control the use of the ARM3
- processor's cache, where one is fitted to a machine. The module will
- immediately kill itself if you try to run it on a machine that only has an
- ARM2 processor fitted.
-
-
- Summary of facilities
- ---------------------
-
- * Commands are provided: one to configure whether or not the cache is
- enabled at a power-on or reset, and the other to independently turn the
- cache on or off.
-
- There is also a SWI to turn the cache on or off. A further SWI forces the
- cache to be flushed. Finally, there is also a set of SWIs that control how
- various areas of memory interact with the cache.
-
- The default setup is such that all RISC OS programs should run unchanged
- with the ARM3's cache enabled. Consequently, you are unlikely to need to
- use the SWIs (beyond, possibly, turning the cache on or off).
-
-
- Notes
- -----
-
- A few poorly-written programs may not work correctly with ARM3 processors,
- because they make assumptions about processor timing or clock rates.
-
-
- Finding out more
- ----------------
-
- For more details of the ARM3 processor, see the Acorn RISC Machine family
- Data Manual. VLSI Technology Inc. (1990) Prentice-Hall, Englewood Cliffs,
- NJ, USA: ISBN 0-13-781618-9.
-
-
-
-
-
- SWI Calls
- =========
-
-
-
- Cache_Control (SWI &280)
- ========================
-
- Turns the cache on or off
-
-
- On entry
- --------
- R0 = EOR mask
- R1 = AND mask
-
-
- On exit
- -------
- R0 = old state (0 => cacheing was disabled, 1 => cacheing was enabled)
-
-
- Interrupts
- ----------
- Interrupts are disabled
- Fast interrupts are enabled
-
-
- Processor mode
- --------------
- Processor is in SVC mode
-
-
- Re-entrancy
- -----------
- Not defined
-
-
- Use
- ---
- This call turns the cache on or off. Bit 0 of the ARM3's control register 2
- is altered by being masked with R1 and then exclusive ORd with R0: ie new
- value = ((old value AND R1) XOR R0). Bit 1 of the control register is also
- set, forcing the memory controller to use the same translation table for
- both User and Supervisor Modes (as indeed the MEMC chip should). Other bits
- of the control register are set to zero.
-
-
- Related SWIs
- ------------
- None
-
-
- Related vectors
- ---------------
- None
-
-
-
- Cache_Cacheable (SWI &281)
- ==========================
-
- Controls which areas of memory may be cached
-
-
- On entry
- --------
- R0 = EOR mask
- R1 = AND mask
-
-
- On exit
- -------
- R0 = old value (bit n set => 2MBytes starting at n*2MBytes are cacheable)
-
-
- Interrupts
- ----------
- Interrupts are disabled
- Fast interrupts are enabled
-
-
- Processor mode
- --------------
- Processor is in SVC mode
-
-
- Re-entrancy
- -----------
- Not defined
-
-
- Use
- ---
- This call controls which areas of memory may be cached (ie are cacheable).
- The ARM3's control register 3 is altered by being masked with R1 and then
- exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
- of the control register is set, the 2MBytes starting at n*2MBytes are
- cacheable.
-
- The default value stored is &FC007FFF, so ROM, the RAM disc and logical
- non-screen RAM are cacheable, but I/O space, physical memory and logical
- screen memory are not.
-
- (You may find a value of &FC007CFF - which disables cacheing the RAM disc -
- gives better performance.)
-
-
- Related SWIs
- ------------
- Cache_Updateable (SWI &282), Cache_Disruptive (SWI &283)
-
-
- Related vectors
- ---------------
- None
-
-
-
- Cache_Updateable (SWI &282)
- ===========================
-
- Controls which areas of memory will be automatically updated in the cache
-
-
- On entry
- --------
- R0 = EOR mask
- R1 = AND mask
-
-
- On exit
- -------
- R0 = old value (bit n set => 2MBytes starting at n*2MBytes are cacheable)
-
-
- Interrupts
- ----------
- Interrupts are disabled
- Fast interrupts are enabled
-
-
- Processor mode
- --------------
- Processor is in SVC mode
-
-
- Re-entrancy
- -----------
- Not defined
-
-
- Use
- ---
- This call controls which areas of memory will be automatically updated in
- the cache when the processor writes to that area (ie are updateable). The
- ARM3's control register 4 is altered by being masked with R1 and then
- exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
- of the control register is set, the 2MBytes starting at n*2MBytes are
- updateable.
-
-
- The default value stored is &00007FFF, so logical non-screen RAM is
- updateable, but ROM/CAM/DAG, I/O space, physical memory and logical screen
- memory are not.
-
-
- Related SWIs
- ------------
- Cache_Cacheable (SWI &281), Cache_Disruptive (SWI &283)
-
-
- Related vectors
- ---------------
- None
-
-
-
- Cache_Disruptive (SWI &283)
- ===========================
-
- Controls which areas of memory cause automatic flushing of the cache on a
- write
-
-
- On entry
- --------
- R0 = EOR mask
- R1 = AND mask
-
-
- On exit
- -------
- R0 = old value (bit n set => 2MBytes starting at n*2MBytes are disruptive)
-
-
- Interrupts
- ----------
- Interrupts are disabled
- Fast interrupts are enabled
-
-
- Processor mode
- --------------
- Processor is in SVC mode
-
-
- Re-entrancy
- -----------
- Not defined
-
-
- Use
- ---
- This call controls which areas of memory cause automatic flushing of the
- cache when the processor writes to that area (ie are disruptive). The
- ARM3's control register 5 is altered by being masked with R1 and then
- exclusive ORd with R0: ie new value = ((old value AND R1) XOR R0). If bit n
- of the control register is set, the 2MBytes starting at n*2MBytes are
- updateable.
-
- The default value stored is &F0000000, so the CAM map is disruptive, but
- ROM/DAG, I/O space, physical memory and logical memory are not. This causes
- automatic flushing whenever MEMC's page mapping is altered, which allows
- programs written for the ARM2 (including RISC OS itself) to run unaltered,
- but at the expense of unnecessary flushing on page swaps.
-
-
- Related SWIs
- ------------
- Cache_Cacheable (SWI &281), Cache_Updateable (SWI &282)
-
-
- Related vectors
- ---------------
- None
-
-
-
- Cache_Flush (SWI &284)
- ======================
-
- Flushes the cache
-
-
- On entry
- --------
- -
-
-
- On exit
- -------
- -
-
-
- Interrupts
- ----------
- Interrupts are disabled
- Fast interrupts are enabled
-
-
- Processor mode
- --------------
- Processor is in SVC mode
-
-
- Re-entrancy
- -----------
- Not defined
-
-
- Use
- ---
- This call flushes the cache by writing to the ARM3's control register 1.
-
-
- Related SWIs
- ------------
- None
-
-
- Related vectors
- ---------------
- None
-
-
-
-
-
- * Commands
- ==========
-
-
-
- *Cache
- ======
-
- Turns the cache on or off, or gives the cache's current state
-
-
- Syntax
- ------
- *Cache [On|Off]
-
-
- Parameters
- ----------
- On or Off
-
-
- Use
- ---
- *Cache turns the cache on or off. With no parameter, it gives the cache's
- current state.
-
-
- Example
- -------
- *Cache Off
-
-
- Related commands
- ----------------
- *Configure Cache
-
-
- Related SWIs
- ------------
- Cache_Control (SWI &280)
-
-
- Related vectors
- ---------------
- None
-
-
-
- *Configure Cache
- ================
-
- Sets the configured cache state to be on or off
-
-
- Syntax
- ------
- *Configure Cache On|Off
-
-
- Parameters
- ----------
- On or Off
-
-
- Use
- ---
- *Configure Cache sets the configured cache state to be on or off.
-
-
- Example
- -------
- *Configure Cache On
-
-
- Related commands
- ----------------
- *Cache
-
-
- Related SWIs
- ------------
- Cache_Control (SWI &280)
-
-
- Related vectors
- ---------------
- None
-
- ******************************************************************************
-
- I hope this helps.
-
- - Michael J Hardy Email: mhardy@acorn.co.uk
-
- Acorn Computers Ltd Telephone: +44 223 214411
- Cambridge TechnoPark Fax: +44 223 214382
- 645 Newmarket Road Telex: 81152 ACNNMR G
- Cambridge CB5 8PB
- England Disclaimer: All opinions are my own, not Acorn's
-
-
-
- From: osmith@acorn.co.uk (Owen Smith)
- Subject: Re: Risc-OS Documentation
- Date: 13 Aug 91 15:06:19 GMT
-
- The ARM3 SWIs really aren't all that interesting, and I've just totally
- failed to find a documentation file for them. However, as a tester, here
- is a bit of BASIC (courtesy of Brian Brunswick) which marks the RAM disk
- area as not cacheable. This in fact makes it go faster.
-
- SYS "Cache_Cacheable", 0, &fffffcff
- SYS "Cache_Updateable", 0, &fffffcff
-
- The reason it goes faster is that because such large amounts of data are
- being slurped around, the memory copy loop tends to get flushed out of
- the cache, particularly since it is a long piece of loop unrolled code
- (for speed on an ARM2). So you end up with a cache full of data, very little
- of which is ever accessed again before it gets flushed out of the cache by
- some more data. The loop does an LDM and STM 10 registers at a time in
- RamFS, so in theory there are two words that get cached (ARM3 read 4 words
- at a time), but this saving is swallowed up by the cache synchronisation
- delays.
-
- You have to be careful though. Brian has his own re-sizing ram disk
- which uses the system sprite area. Marking the system sprite are as not
- cacheable makes it go slower. We (Brian and I) think this is because he
- uses the C function memcpy(), in which the LDM and STM is 4 registers
- at a time. Since this is a multiple of four, it hits the ARM bug where
- it loads 5 words and then throws the fifth one away, which results in
- loading 8 words on an ARM3 (it always reads 4 word chunks even with the
- cache off). So with the cache off, you load 8 then throw 4 away, load the
- next 8 (including the 4 you just threw away) and throw 4 away etc. So
- you are effectively reading all the data twice. With the cache on this
- goes down to once. Yes the code will probably get flushed out, but it
- is a tight loop (not unrolled) so it is not very likely and the cost of
- reloading the code is less than the saving on the data loads.
-
- The moral of this is to be careful with the ARM3 SWIs, and don't just
- think that it ought to go faster, do timings, in lots of different screen
- modes.
-
- Owen.
-
-